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ABSTRACT 

A key signature of module exchange in the genome 
is phase symmetry of exons, suggestive of exon 
shuffling events that occurred without disrupting 
translation reading frame. At the protein level, intrin- 
sic structural disorder may be another key element 
because disordered regions often serve as 
functional elements that can be effectively 
integrated into a protein structure. Therefore, we 
asked whether exon-phase symmetry in the human 
genome and structural disorder in the human 
proteome are connected, signalling such evolution- 
ary mechanisms in the assembly of multi-exon 
genes. We found an elevated level of structural 
disorder of regions encoded by symmetric exons 
and a preferred symmetry of exons encoding for 
mostly disordered regions (>70% predicted 
disorder). Alternatively spliced symmetric exons 
tend to correspond to the most disordered 
regions. The genes of mostly disordered proteins 
(>70% predicted disorder) tend to be assembled 
from symmetric exons, which often arise by 
internal tandem duplications. Preponderance of 
certain types of short motifs (e.g. SH3-binding 
motif) and domains (e.g. high-mobility group 
domains) suggests that certain disordered 
modules have been particularly effective in exon- 
shuffling events. Our observations suggest that 
structural disorder has facilitated modular 
assembly of complex genes in evolution of the 
human genome. 

INTRODUCTION 

The intron/exon structure of genes bears witness to the 
evolutionary history of their genesis, often revealing 



their assembly from pre-existing genetic elements (exons) 
encoding for functional units at the protein level (struc- 
tural modules and domains). The assembly procedure 
requires that effective genetic mechanisms operate for 
exon exchange and insertion (exon shuffling), and that 
the module is incorporated into the recipient protein 
without much structural (and functional) conflict. A key 
signature of this assembly mechanism is a bias in exon 
symmetry, i.e. the enrichment for exons flanked by 
introns of the same phase (1-4). In principle, introns can 
split the reading frame between codons (Phase 0) or within 
codons (Phases 1 and 2), which results in nine different 
exonic phase types. It is generally thought that the 
observed genomic bias in exon phases is explained by 
the evolutionary preference for exchanging symmetric 
exons (0,0, 1,1 and 2,2), which do not disrupt the 
reading frame downstream. The successful integration of 
such exons also implies the structural and functional 
compatibility of the encoded regions with the recipient 
proteins. 

Our traditional view of protein structure and function 
assumes that proteins have a well-defined 3D structure; 
therefore, such compatibility is thought to infer that 
successful modules correspond to domains and/or second- 
ary structural building blocks (5-7). With the advent 
of recognizing structural disorder in proteins (8-10), this 
view needs to be re-examined and extended. Bioinformatic 
predictions suggest that ~50% of human proteins have at 
least one long disordered region (11), and structural 
disorder plays important roles in proteins of signalling 
and regulatory functions (12). In light of the diverse func- 
tional advantages of structural disorder (13,14) and the 
noted preference for disordered modules in alternative 
splicing (15), we decided to re-visit modularity of human 
protein-coding genes in light of exon symmetry and struc- 
tural disorder. 

Structural disorder is usually higher in eukaryotes than in 
prokaryotes (12,16,17), although it varies highly in both 
phylogenetic groups (18,19). If evolutionary expansion of 
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disorder has been driven by module exchange, this general 
trend of disorder may also result from the advance of exon 
shuffling in metazoa (7), which has increased complexity 
and functional diversity in multicellular organisms. As 
underscored by its power-law genomic distribution (13), 
structural disorder encompasses modularity at different 
scales (20), corresponding to short-linear motifs (SLiMs) 
or eukaryotic linear motifs (ELMs) (21-23), domains (24) 
or linker regions (25). The intimate connection of structural 
disorder and modularity is perhaps best exemplified by 
scaffolds, complex multi-domain signalling proteins (26). 

To address the involvement of structural disorder in 
modular evolution by exon shuffling, we generated a 
library of human exons and determined their flanking 
intron phases (exon phase) by analysing transcriptome 
data. We observed a significant bias for exon symmetry 
(1-3,5) and elevated levels of disorder in the protein 
regions encoded by symmetric exons. We also found 
that exons encoding for largely disordered regions tend 
to be symmetric, and the genes of mostly disordered 
proteins tend to be assembled from symmetric exons. 
Successive symmetric exons show significant homology 
to each other, and they encode for regions of elevated 
disorder. The facility of molecular assembly from such 
elements is also underlined by a bias for phase symmetry 
and enhanced encoded disorder in alternative (versus con- 
stitutive) exons. The length distribution of exons encoding 
for disordered regions is much broader than that of exons 
encoding for ordered regions, in accord with previously 
observed power-law distribution of disorder (13), which 
may suggest that these exons may encode for either 
short- (motifs) or long- (domains or linkers) functional 
elements (21,22,24). A significant enrichment of certain 
functional motif types in symmetric exons demonstrates 
that shuffling of symmetric exons can incorporate short- 
functional modules into proteins. In all, symmetric exons 
encoding for disordered regions seem to have been amply 
exploited in the generation of multi-exon genes of modular 
proteins, probably contributing to the explosive spread of 
structural disorder early in eukaryotic evolution. 

MATERIALS AND METHODS 

Data preparation 

Human mRNA sequences containing the locations of 
coding sequences and exons were retrieved from the 
NCBI Refseq database. To filter out redundancy, only 
the longest sequences (splice variants) were selected for 
every gene identifier. We calculated the phases of the Isl- 
and C-terminal flanking introns for every exon: Phase 0 
introns split the reading frame between two codons, 
whereas Phase 1 and Phase 2 introns follow the first and 
second nucleotide of the codon, respectively. Coding 
mRNA sequences were translated into protein sequences, 
and regions corresponding to exons were assumed to start 
with the first complete codon and end with last (even if 
interrupted) codon. We only took exons into consider- 
ation if they had determined phases at both termini 
because of lying entirely in the coding region (termed 
complete exons). Our data set contains 8552 protein 



sequences with boundaries and phases of 78 502 
complete exons (of 94471 total exons). 

Prediction of structural disorder and disorder definitions 

Structural disorder of proteins and the regions encoded by 
individual exons were predicted with the IUPred algo- 
rithm (14,27). A residue was classified as locally dis- 
ordered if its disorder score is >0.5. For the disorder of 
exons, disorder was predicted for the whole protein, 
which was then split into regions encoded by the individ- 
ual exons and their average disorder was calculated. The 
average disorder of a protein or a region corresponding 
to an exon is the per cent of disordered residues in the 
sequence. The level of disorder for a class of proteins or 
exon-encoded regions is meant as the mean of individual 
values: it is termed 'mostly ordered' or 'mostly disordered' 
if <30% or >70% of the residues are predicted to be 
disordered, respectively. 

Analysis of exon phases 

The expected frequency of exons with any of the nine 
possible exon-phase combinations was calculated on the 
basis of the observed frequencies of flanking introns of 
the three possible phases. If introns limit exons by pure 
chance, exon-phase combination (i,j) would occur at a 
frequency Nj*Nj/N tota |, where Nj and Nj are the occurrence 
of phase i and phase j introns, respectively, and N total is the 
total number of exons. The statistical significance of 
the difference between expected and observed frequencies 
of each exon phase combinations was tested by the x 2 test, 
applying Bonferroni correction. 

Analysis of adjacent exons 

To test whether the sequence of subsequent exons of the 
same symmetric phase type tends to be more similar to 
each other than that of randomly selected exons, we ran 
BlastP search with the sequences of individual exons. We 
recorded the occurrence of significant similarity identity 
hits and calculated their per cent frequency of neighbour- 
ing exons of identical symmetrical phase type versus 
neighbouring exons of different phase type. We omitted 
exon pairs that contain exons without self-identity hit. For 
checking significance, we performed / 2 test with the 
concrete occurrence numbers. We also randomly selected 
5000-5000 identity values (including zero values) belong- 
ing to neighbouring exons of the same or different phase 
types and performed unpaired t-test. The frequency of 
significant similarities and also the sets of identity values 
were found to be significantly different. To detect cases 
of exon duplication, we compared each exon with its 
neighbouring exon using also Blastp similarity search. 

Identification of alternative and constitutive exons 

To compare phase preferences and disorder of alternative 
and constitutive exons, we used all the human mRNA 
sequences in NCBI Refseq database of genes encoding 
for more than one protein product (not only the longest 
sequence for every gene identifier). Exon phases, exon 
boundaries and structural disorder were calculated as 
described earlier in the text. The resulting data set 
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contains 10245 protein sequences with boundaries, phases 
and predicted disorder of 108 006 complete exons. An 
exon was classified as 'constitutive', if it can be found in 
all protein isoforms generated from the same gene (having 
the same gene identifier), otherwise it was classified as 
'alternative'. To determine whether an exon occurs in a 
certain isoform, we used BlastP with a 100% identity 
threshold. 

Linear motif and domain prediction 

Linear motifs available in the ELM database (28) were 
predicted for all proteins. For each exon, we then 
calculated what portion of its sequence is covered by 
linear motif(s), termed motif coverage. Occurrence and 
location of domains were computed by the hmmsearch 
algorithm using the PFAM A seed alignment database 
(29) for all proteins. For each exon, we calculated what 
portion of its sequence is covered by PFAM domain(s), 
termed domain coverage. We also looked for the most 
significantly overrepresented cases of motifs and 
domains in disordered regions of proteins encoded by 
symmetric exons. We counted the occurrence of motifs 
and domains in different types of (e.g. all, symmetric, sym- 
metric disordered and so forth) exons, and the resulting 
values were normalized to the total length (number of 
amino acids) of exons of the proper type. Actually, we 
normalized the occurrence numbers to 1000 and 10000 
amino acids for motifs and for domains, respectively 
(because of the high proportion of false-positive hits, 
there are lot more predicted motifs than domains). 
Overrepresentation is the ratio of the normalized occur- 
rence of a motif or domain in a certain exon type versus all 
exons. We only took into account motifs that occur >10 
times in symmetric, disordered, short exons and domains 
that occur >5 times in symmetric disordered exons. 

Search for select examples of proteins assembled from 
disordered modules 

To select proteins of high disorder that are assembled from 
modules encoded by symmetric exons, we ranked all 
human proteins by their number of successive exons with 
the same (symmetric) phase and selected those that have a 
predicted disorder >40%. From these, we selected and 
discuss in detail some biologically interesting examples. 

Statistical analysis and programming 

All programs were written in Perl. The software lUPred 
was obtained from the authors and was compiled and 
executed locally. For checking significance, when we 
compared sets of values, we randomly selected 5000 
values belonging to each exon sets and performed 
unpaired /-test. In the case of comparing occurrences, 
we used x 2 test with Bonferroni correction, if needed. 

RESULTS 

Exon-phase bias in the genome 

First, we asked whether the intron-phase combinations of 
our selected exons show the characteristic bias previously 



observed (3,5). Our data set contains 78 502 complete 
exons flanked by 86487 introns on both sides in 8552 
genes/proteins, i.e. 9.2 on average. First, we counted 
different types of introns (0, 1, 2) and calculated their 
per cent occurrence; in accordance with the literature 
(1,2), the frequency of introns of different phases is sig- 
nificantly different: 46.28% (40023) for Phase 0, 32.65% 
(28238) for Phase 1 and 21.07% (18 226) for Phase 2. 
From these figures, we calculated the expected occurrences 
of exons of the nine different phase classes and compared 
them to the actual observations (Table 1): all three sym- 
metric exon types (0,0, 1 , 1 and 2,2) occur more frequently 
than expected by chance. The differences are highly 
significant (P < 0.0001), except for class 2,2, probably 
because of its low incidence, as shown by x 2 statistical 
analysis with Bonferroni correction. 

Exon-phase symmetry and structural disorder 

Next, we asked whether structural disorder of encoded 
regions distinguishes symmetric and asymmetric exons. 
To this end, we predicted disorder for whole proteins, 
then split them into regions encoded by the individual 
exons and calculated their average disorder (Table 1). 
The average disorder of all regions corresponding to 
symmetric exons, 21.23% (24.16% of phase 1,1 exons), 
is significantly higher (P < 0.0001, using unpaired /-test) 
than those of asymmetric exons 17.17% (Figure 1A). 
These values suggest that the evolutionary mechanism 
that preferred symmetric versus asymmetric exon types 
also favoured protein disorder. We also calculated the 
average length of exons of the different phase classes. 
Here, no conspicuous differences in the averages are 
observed, with the exception of exons 1,1, which are 
significantly longer than all others. The reason of this dif- 
ference is not clear. 

Although the observed differences in the disorder 
content of exons of different phase classes are significant, 
they all tend to have values <25%, which do not reveal the 
potential use of shuffling exons of largely disordered 
regions modular assembly. To address this issue, we dis- 
tinguished exons of mostly ordered and mostly disordered 
regions by a threshold of predicted disorder (<30 and 
>70%, respectively) and calculated their per cent occur- 
rence (Table 2 and Figure IB). A large proportion 
of exons of mostly disordered regions are symmetric 
(0,0 and 1,1), exceeding even the biased occurrence of 
all symmetric exons (all symmetric, expected: 36.52%; 
all symmetric, observed: 41.48%; all symmetric, mostly 
disordered: 49.37%). The difference (between the occur- 
rence of all symmetric and mostly disordered symmetric 
exons) is highly significant (P < 0.0001), as shown by / 2 
statistical analysis. 

We were also curious to know whether the increased 
disorder associated with symmetric exons reflects the 
disorder of only the region corresponding to the exon, 
the entire protein or both because there are several 
possible scenarios for the use of symmetric disordered 
modules in the assembly of proteins encoded by multi- 
exon genes. It is possible that exons encoding for these 
are incorporated into the gene of a mostly ordered or 
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Table 1. Observed and expected occurrence (of exons), and average disorder and length of regions (encoded by exons) of different phase classes 
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Figure 1. Correlation of structural disorder and exon-phase bias. Symmetric exons have a preference for structural disorder. (A) Average structural 
disorder (per cent of amino acids predicted to be disordered) of all human exons, of all the six classes of asymmetric exons, of three classes of 
symmetric exons (phases 0,0; 1,1; and 2,2) and of consecutive symmetric exons of the same phase. All four values are significantly different from each 
other (with unpaired (-test). (B) Occurrence of symmetric (dark grey) and asymmetric (light grey) exons in humans [all exons, mostly ordered (<30% 
disorder) and mostly disordered (>70% disorder) exons]. (C) Occurrence of symmetric (dark grey) and asymmetric (light grey) exons in all human 
proteins and mostly ordered (<30% disorder) and mostly disordered (>70% disorder) proteins. (D) Average disorder of regions encoded by 
asymmetric and symmetric exons in short (encoded by a gene of maximum two exons) and long (encoded by a gene of at least five exons) 
proteins. Error bars represent standard errors of mean. 
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Table 2. Per cent occurrence and average length of mostly ordered and mostly disordered regions encoded by exons of different phase classes, or 
exons located in the genes of mostly ordered and mostly disordered proteins 
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mostly disordered protein, or both. To this end, we 
correlated exon-phase preferences with the predicted 
disorder of the entire protein, distinguishing mostly 
ordered (<30% disorder) and mostly disordered (>70% 
disorder) proteins (Table 2). The preference for symmetric 
exons is even more striking here than in the previous cases: 
55.99% of the exons of mostly disordered proteins are 
symmetric, highly significantly more than 40.72% of 
exons of mostly ordered proteins (P < 0.0001) (Table 2 
and Figure 1C), suggesting prevalent modular evolution- 
ary assembly mechanism relying on the shuffling of sym- 
metric exons (see also 'Select Examples of Proteins 
Assembled from Disordered Modules Encoded by 
Symmetric Exons' section). Interestingly, the level of 
disorder of regions of symmetric exons strongly correlates 
with the overall level of disorder of the protein 
(P< 0.0001) (Supplementary Figure SI). Actually, 
disorder corresponding to all/asymmetric exons also 
strongly correlates with the overall level of disorder of 
the protein (P < 0.0001), but the average disorder is 
always higher in the case of symmetric exons, which 
may infer that in the evolutionary construction of 
ordered proteins, mostly ordered symmetric exons have 
been used (5,6), whereas in the evolutionary construction 
of disordered proteins, mostly disordered symmetric exons 
have been preferred. 

Modular assembly of proteins 

The previous results indicate that two complementary 
strategies might have been used for the construction of 
modular proteins, either dominated by the assembly of 
ordered modules or disordered modules. Of course, the 
slight preferences do not exclude the complementary use 
of the two types of modules, which would occur if 
modular proteins composed of ordered domains and dis- 
ordered linkers (26) are assembled, for example. In light of 
the preference of disordered proteins for the presence of 
repetitive regions (30), however, this observation is com- 
patible with internal tandem duplications of exons. To 
check whether the combination of phase symmetry and 
structural disorder has played a role in generating 



multi-exon genes by this mechanism, first we asked 
whether structural disorder associated with successive 
symmetric exons (necessarily of the same phase class) 
tends to correlate. We found that this is the case: when 
at least three symmetric exons are found in succession, the 
corresponding level of predicted disorder is significantly 
higher than the average (P = 0.0001; using unpaired t- 
test) (Table 1, Figures 1A and 2 A and Supplementary 
Table SI). Moreover, both among exons of local 
disorder and exons encoding for disordered proteins, 
successive symmetric exons occur with a significantly 
elevated frequency (P < 0.0001 in both cases, with / 2 
test) (Figure 2B). 

To provide evidence that internal duplications have 
been preferred in these cases, we have looked whether 
the sequences of subsequent exons of the same symmetric 
phase class tend to be more similar than those of random 
consecutive exons. We compared the per cent occurrence 
of segments of significant similarity by BlastP search in 
different exon sets (Figure 2C and Supplementary Table 
S2). We found that among consecutive symmetric exons, 
similarity occurs much more frequently than among 
random consecutive exons (7.8 versus 2.6%) and even 
more than among asymmetric consecutive exons 
(1.19%). These values are highly significantly different 
from each other (see 'Materials and Methods' section). 
This preference is even more evident if we compare the 
phase classes of neighbouring exons having highly similar 
segments (identity is >70%): in 108 of 118 cases (91.5%), 
consecutive exons have the same (symmetric) phase type. 

These results are indicative of the preferred evolution of 
complex genes by internal duplications of exons, especially 
disordered symmetric exons, which would suggest that 
it can be observed much more frequently in genes con- 
stituted of more exons. First we checked, whether the 
long genes are long because of exon duplication. Blastp 
searches were carried out to identify homology, and we 
compared the frequency of similarity of adjacent exons in 
'short' (<2 exons) and 'long' (>5 exons) genes. Of course, 
among short genes, only two-exon genes can have exon 
duplication. We found that long genes have higher 
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Figure 2. Disorder and occurrence of adjacent exons. (A) Average 
structural disorder (per cent of predicted disordered amino acids) 
encoded by all human exons, at least three consecutive (successive) 
exons with the same phase and all other, not consecutive exons. All 
three values are significantly different (with unpaired f-test). 

(B) Occurrence of at least three successive exons with the same phase 
for all exons, mostly ordered (mod, <30% disorder) exons, mostly 
disordered (mdd, >70% disorder) exons and such exons located in 
mostly ordered (mod) or mostly disordered (mdd) proteins. 

(C) Frequency of sequentially similar segments in all adjacent exons 
and in adjacent exons with the same or different phase types. 



frequency of similarity of adjacent exons indicative of 
exon duplication (2.6% of neighbouring exons in 9.3% 
of proteins versus only 1.8% of two-exon genes, the dif- 
ferences are significant, P < 0.0001, shown by x analysis). 
Sixty-five per cent of these presumably duplicated exons 
have the same phase type. Long genes can arise not only 
by internal duplications but also by shuffling of exons 
between distinct genes, both of which can signalled by 
exon-phase symmetry. To this end, we compared the 
occurrence of symmetric exons associated with disorder 
in 'short' (<2 exons) and 'long' (>5 exons) genes 
(Supplementary Table S3 and Figure ID). In the case of 
short genes, symmetric exons encoded for more order than 
asymmetric exons (but, the difference is not significant 
between the disorder of symmetric and asymmetric 
exons in the case of short genes), which is in contrast to 
long genes (where the difference is significant with 
unpaired Mest, P = 0.001). In the case of the latter, the 
difference is most conspicuous if genes of mostly symmet- 
ric exons (> 80% symmetric exons, 45.21% disorder) are 
compared with genes constructed of preferentially asym- 
metric exons (<20% symmetric exons, 16.04% disorder). 

Phase and disorder of alternative and constitutive exons 

The evolutionary (genomic) process of exon shuffling in the 
assembly of a gene has strong conceptual parallels with 
alternative splicing in mRNA maturation, when an exon 
is inserted into (or removed from) a mature transcript. 
Although the underlying mechanisms are entirely different, 
the possible consequence of the event on the translational 
frame and the structural integrity of the protein are the 
same, actually the two processes might be evolutionarily 
connected (31-33). In accord, we expected a similar evolu- 
tionary preference for phase symmetry and structural 
disorder in exons subject to alternative splicing. Here, we 
identified the phase class of all exons, which could be 
clearly identified to be expressed constitutively or alterna- 
tively (cf. 'Materials and Methods' section) and predicted 
their disorder (Table 3, please note values here somewhat 
differ from 'all' exons studied previously). In accord with 
previous studies (15,23), alternative exons correspond to 
significantly more disordered regions than constitutive 
exons (29.7 versus 18.8%). The difference is even more 
pronounced in case of symmetric exons (32.46 versus 
19.79%). Differences are significant (P < 0.0001) in both 
cases, using unpaired /-test. 

Interestingly, the preference for structural disorder is 
much more pronounced in the case of alternative versus 
constitutive exons than symmetric versus asymmetric con- 
stitutive exons. As will be discussed in detail, this differ- 
ence comes from the pressure of the gene (product) to be 
viable both with and without the inclusion of the exon in 
the case of alternative splicing, which makes the situation 
much more demanding than in the case of a constitutive 
exon, which was only created and fixed once. 

Symmetric exons as functional modules 

The observed bias for phase symmetry and structural 
disorder may not necessarily infer a functional role for 
the region encoded by the exon, only that these exons 
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Table 3. Disorder and occurrence of regions encoded by alternative and constitutive exons 



Phase type 


Disorder 


Disorder (%) 


Disorder (%) 


Occurrence 


Occurrence (%) 


Occurrence (%) 




(%) all 


'a\ tprna ti vp 




(%) all 


altPTnalrvp 


enn <st i t i i 1 1 vp 


(d n't 


21 48 


29 95 


19 25 


24 3 


22 38 




ID 1 1 


22.13 


29 


20.15 


12.43 


12.3 


12.47 


(0,2) 


16.47 


24.18 


14.78 


9.65 


7.68 


10.23 


(1.0) 


21.45 


28.35 


19.94 


12.64 


10.04 


13.4 


(1.1) 


26.77 


34.46 


22.46 


13.4 


21.28 


11.09 


(1,2) 


20.2 


28.28 


17.83 


6.69 


6.7 


6.69 


(2,0) 


16.89 


22.33 


15.8 


9.89 


7.34 


10.63 


(2,1) 


21.98 


28.46 


19.95 


6.17 


6.49 


6.07 


(2,2) 


21.26 


34.8 


16.22 


4.83 


5.79 


4.55 


All 


21.26 


29.7 


18.8 


100 


100 


100 


Symmetric 


23.12 


32.46 


19.79 


42.5 


49.4 


40.5 


Asymmetric 


19.89 


27.01 


18.12 


57.5 


50.6 


59.5 



are used because they do not disrupt either the translation 
reading frame or the structural integrity of the recipient 
protein. Structural disorder, however, is known for its 
manifold functional advantages (8-10), modularity (20) 
and association with motifs (22,23,34) and domains (24), 
which all suggest its potential for functional integration 
into proteins. To address this feature, we first scrutinized 
the length distribution of symmetric exons, which appar- 
ently have a preference to be used in the construction 
of modular genes (Figure 3). Exons encoding for mostly 
ordered, symmetric (<30% disorder) and mostly dis- 
ordered, symmetric (>70% disorder) regions have a 
rather similar average length [46 and 56 residues, respect- 
ively, there is no significant difference (P = 0.11), 
cf. Table 2], but they differ in length distribution, with 
symmetric exons corresponding to disordered regions 
having significantly higher frequency in the short (<25 
residues) and long (>75 residues) length range 
(Figure 3A). A similar difference can be seen when sym- 
metric and asymmetric exons in this latter class are 
compared, with an excess of symmetric exons in the 
short length range (Figure 3B). These differences might 
infer that exons of order represent more uniform 
building blocks [domains, in accord with prior inferences 
(3,5,6)], whereas exons of disorder are much more spread 
out because they represent three different types of func- 
tional modules (motifs, linkers and domains). Although 
comparison of the motif and domain 'content' of exons 
of different lengths is not conclusive enough, the 
overrepresentation of certain motif and domain types 
points to this direction (see later in the text). 

To address the possibility of these associations, we 
elaborated on the tendency of short-symmetric exons of 
disorder (<25 residues) to encode short-functional motifs 
ELMs (28) or SLiMs (35), and that of longer ones (>70 
residues) to be either linker regions or domains (24). 
To this end, we have searched whether there is a length 
dependence of the occurrence of motifs and domains in 
these exons, based on the ELM database and PFAM 
families, respectively. Apparently, predicting linear 
motifs is fraught with very high-false-positive rates 
(average coverage is 64% for all exons and 71% for 
exons of mostly disordered regions, without any clear 
length preferences), which precludes straightforward 



generalizations. On analysing PFAM data, we found 
that occurrence of domains in symmetric exons of 
disorder is more frequent within short exons, especially 
in the phase 0,0 type (Supplementary Figure S2). 

That is, based solely on coverage, no clear distinction 
can be made between the occurrence of motifs and domain 
in short- versus long-symmetric exons encoding for dis- 
ordered regions. The power of modularity in the 
assembly process, however, can be clearly demonstrated 
by the significant overrepresentation (compared with their 
expected occurrence) of certain functional motifs (Table 4 
for top 10 hits, for further examples, see Supplementary 
Table S4) and PFAM domains (Table 5 for top 10 hits, for 
further examples, see Supplementary Table S5) in symmet- 
ric exons encoding for mostly disordered regions. In case 
of motifs, SH3-binding regions, a range of phosphoryl- 
ation and some other post-translational modification 
sites are found to preferentially occur in these regions. 
Most of these motifs contribute novel protein-protein 
interaction sites (partner of SH3 domain and nuclear lo- 
calization receptor) or post-translational modification site 
(sumoylation site), i.e. they extend the functionality of the 
recipient protein in a simple but straightforward way, in 
accord with recent results showing that disordered regions 
are often involved in rewiring protein-protein interaction 
networks (36). That is, their enrichment is a strong indi- 
cation that their presence provided an adaptive advantage 
to the gene after shuffling, in accord with our conjecture 
that not only structural but also functional compatibility 
with the recipient protein drive the fixation of a shuffled 
exon. In case of domains, the picture is even more varied 
because domains contribute novel functionality to the 
recipient protein in a more subtle and complex way: 
often they enable the complex extension of the function 
of the protein, such as enabling chromatin remodelling 
(HMG14 and HMG17), inhibition of an enzyme 
(calpain inhibitor), regulation of ribosomal DNA gene 
transcription (e.g. Treacher Collins syndrome protein) or 
the elastic function of titin [PPAK motif (37)]. They may 
also be involved in protein-protein interactions, as 
exemplified by the collagen triple helix repeat, which is a 
robust example of changing the oligomerization status of 
the given protein. In all, these examples also confirm that 
evolutionary fixation of a novel shuffled exon not 
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Figure 3. Length distribution of exons. (A) The length distribution of symmetric, mostly ordered (<30% disorder, light grey) and symmetric, mostly 
disordered (>70% disorder, dark grey) exons. The two distributions have similar means (46 and 56 residues, respectively), with exons corresponding 
to symmetric disordered regions displaying a broader distribution that has a significant excess in the regions <25 and >75 residues. (B) The length 
distribution of mostly disordered symmetric (dark grey) and mostly disordered asymmetric (light grey) exons shows the same difference, asymmetric 
disordered exons having a tail in the long region. 
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Table 4. List and occurrence of the top 10 ELM motifs overrepresented in short symmetric exons encoding for disordered regions 



ELMIdentifier 


Overrepresentation Motif regex 


Motif description 




of motif in 








symmetric, 








disordered, 








short exons 






T TG <\H^ 1 


7.86 


[RKY]..P..P 


1 Ills la L11C 111U 111 I CL<OjilllZ,CLl Uy Llctas 1 ijilj LHJIllcllIls. 


LIU IV V. flJ 


4.65 


RGD 


Xhc RGD motif cs.ii be found in many proteins of the extracellular 

matfiY iinH it ic rppi-ifTiii'zpi'i r\\t Hirfpi*prit niprnnprc f~\t trip intporin 

lllCllllA, tlllLl 11 IS 1 CHJ t;lll^-CLl UV LllllCldll 111C111UC1S Ul 111G lllLCgllll 

family. The structure of the 10th type III module of fibronectin 
has shown that the RGD motif lies on an exposed flexible 
location. 


T TG SH^ 1 


4.12 


. . .[PV]..P 


Tins is flip motif rppn itti i 7pH hv tlinsp ST43 domains with a 

L 13 1111. 111U111 1 LLU tUllZ^U I.' V llll-'.il. Jl 1 I.M. ' 1 1 1 1 1 1 1 1 i Willi CI 

non-canonical class I recognition specificity. 


1 TG FVH1 1 

1 1 V I V ill 1 


4.04 


[FILVY].{0,1)P.[PAILSK]P 


Prptlitip-riph mntif hi tiH in tr tn Qioti^il trim <;H i ipfi nn pIm <;<; T F VT-1 1 

1 1 X lvJ.1 lll^JLll UlUUlll £^ LKJ Jlt~lltll LI CLlloU LIC ll^Jll ^ItlSS 1 1 - V 111 

domains. 


T 10 SH3 2 


3.79 


P..P.[KR] 


Tli i <; is flip m nti f rppncnn 7pH hv pi 3 TT ST-T^ H n m n in q 

1 1 11 J> 1j Llll, 111UL11 1 ^^^J^lllZj^-U. LJ V G1CL33 11 J11J UVJlllulllj. 


MOD SUMO 


3.4 


[VILMAFP](K).E 


Motif recognized for modification by SUMO-1. 


LIGTRAF6 


3.33 


..P.E..[FYWHDE]. 


TRAF6-binding site. Members of the tumour necrosis factor 
receptor (TNFR) superfamily initiate intracellular signalling by 
recruiting the C-domain of the TNFR-associated factors 
(TRAFs) through their cytoplasmatic tails. 


LIG_SH3_4 


3.22 


KP..[QK]. . . 


This is the motif recognized by those SH3 domains with a 
non-canonical class II recognition specificity. 


TRGNLSJVIonoCore 


2 3.11 


fDE]((K[RK])|(RK)) 
[KRP][KR][*DE] 


Monopartite variant of the classical basically charged NLS. Strong 
core version. 


TRG_NLS_Bipartite_l 


2.73 


[KR][KR].{7,15}fDE] 
((K[RK])|(RK)) 
(([*DE][KR])|([KR] 
["DE]))['DE] 


Bipartite variant of the classical basically charged NLS. 



Table 5. List and occurrence of the top 10 PFAM domains of 
predicted disorder, encoded in symmetric exons 



Pfam ID 


Overrepresentation 


Domain description 




of domain in 






symmetric 






disordered exons 




PF01101.12 


15.11 


HMG14 and HMG17 


PF03546.8 


15.07 


Treacher Collins syndrome protein Treacle 


PF00748.13 


15.07 


Calpain inhibitor 


PF12301.2 


14.13 


CD99 antigen like protein 2 


PF01391.12 


13.65 


Collagen triple helix repeat (20 copies) 


PF02818.9 


13.58 


PPAK motif 


PF05279.5 


12.59 


Aspartyl ^-hydroxylase N-terminal region 


PF12235.2 


9.07 


Fragile X-related 1 protein C terminal 


PF06583.6 


9.07 


Neogenin C-terminus 


PF06464.5 


8.13 


DM API -binding domain 



only relies on its compatibility with the gene and 
structural integrity of the order but also at lest as much 
on the functional extension of the resulting gene 
product (30). 

Select examples of proteins assembled from disordered 
modules encoded by symmetric exons 

Behind all the predictions, correlations and general obser- 
vations, there are individual proteins, the structural 
disorder of which is experimentally characterized and is 
shown to be involved in function. Studying the gene struc- 
ture of these proteins provides further evidence to the 
evolutionary agility of symmetric exons encoding for dis- 
ordered region. Here, we present a few select examples 
that demonstrate how these modularity principles apply 



to the assembly of disordered proteins (Figure 4 and 
Supplementary Table S6). 

In three cases (Figure 4A), we show fully disordered 
proteins encoded by multi-exon genes with a strong bias 
for symmetric exons of the same phase class. Their exons 
correlate with functional regions/domains of the proteins, 
which argue that the success of shuffling of the exon also 
relied on the productive incorporation of the encoded 
disordered structural/functional module into the protein, 
(i) Calpastatin (CST) is the fully disordered (38,39) inhibi- 
tor of the calcium-activated cysteine protease calpain 
that undergoes limited induced folding on inhibition 
(38). The gene of the protein is assembled from class 1,1 
symmetric exons. The protein has four calpain-inhibitory 
(I through IV) domains and an N-terminal L domain of 
unrelated function, perfectly matched by the exonic struc- 
ture of the gene, (ii) Microtubule-associated protein 2 
(MAP2) is a fully disordered protein (40) that belongs to 
the protein family regulating microtubule assembly. 
Tubulin-bound MAP2 exhibits chaperone-like activity 
likely because of the N-terminal domain containing 
several patches of acidic amino acids. The protein also 
has a projection domain, a proline-rich region (P) and 
four tubulin-binding repeats (R1-R4) in the C-terminal 
region (41). Its gene has been assembled from type 1,1 
symmetric exons matching the functional regions of the 
protein. The two 1,0-0,1 exon pairs at the C-terminus 
can probably be also explained by the insertion of a 
Phase 0 intron into an ancestral symmetric 1,1 
exon followed by the duplication of this exon pair, 
(iii) Methyl-CpG-binding protein 1 (MBD1) is the 
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Figure 4. Select examples of proteins assembled from disordered modules encoded by symmetric exons. (A) A few long disordered proteins, encoded 
by genes that have an exon/intron structure indicative of modular assembly from symmetric disordered exons. In each case, the domain structure of 
the protein based on structural and functional data is outlined, and phases of introns separating exons are indicated above the domain structure. CST 
is the inhibitor of calpain, a calcium-activated cysteine protease. The inhibitor has four inhibitory domains (I, II, III and IV) (olive), each having 
three conserved subdomains (black) and an additional L-domain (purple) of unrelated function. MAP2 belongs to the microtubule-associated protein 
family. Its N-terminal domain (dark green) exhibits chaperone-like activity, it has a proline-rich region (P) (light blue) and four tubulin-binding 
repeats (R1-R4) (red) in the C-terminal region, connected by a middle projection domain (orange). MBD1 is member of a family of nuclear proteins. 



(continued) 
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member of a family of nuclear proteins that have a 
methyl-CpG-binding domain (MBD). The protein is 
fully disordered (42), it can bind methylated DNA, and 
it can repress transcription from methylated gene pro- 
moters. MBD1 contains multiple domains: an 
N-terminal MBD, three CXXC-type zinc-finger domains 
that can bind non-methylated CpG dinucleotides and a 
transcriptional repression domain (TRD) at the 
C- terminus (43). Almost the entire protein has been 
assembled from type 0,0 exons (some tend to be 
ordered), reflecting the domain organization of the 
protein. 

In Figure 4B, we show an entire family of homologous 
fully disordered (44-46) proteins that diverged from an 
ancestral gene by the module exchange mechanisms 
based on symmetric exons. These proteins are involved 
in biomineralization in the formation of teeth and/or 
bone. Enamel matrix proteins amelogenin (AMELX/Y), 
ameloblastin (AMBN) and enamelin (ENAM) organize 
and regulate hydroxyapatite crystallization in the enamel 
organ (47). Bone and teeth proteins dentin sialophospho- 
protein (DSPP), dentin matrix acidic phosphoprotein 1 
(DMP1) and bone sialoprotein (BSP) belong to the 
SIBLING (small integrin-binding ligand, N-linked glyco- 
protein) family (44). At the primary amino acid sequence 
level, these proteins show little similarity, but their func- 
tional relatedness, modular organization and exon/intron 
structure point to their common origin. It is thought that 
the entire family diverged from a common ancestor, 
SPARC (osteonectin, not included in our analysis). 
Apparently, the family evolved by gene duplication, the 
addition of class 0,0 modules and internal duplications, 
which have led to two clusters: the enamel protein genes 
(AMBN, ENAM and AMEL) and the bone-dentin 
protein genes (DSPP, DMP1, integrin-binding 
sialoprotein (IBSP), matrix extracellular phosphogly- 
coprotein (MEPE) and secreted phosphoprotein 1 
(SPP1)) (47,48). Functional modules and exons are in 
close correspondence and show the gradual evolution 
and diversification of the family. The first coding exon is 
the signal peptide and the first two amino acids of the 
mature protein (SP + AA), whereas exon 2 usually 
contains the consensus sequence (SXE) for casein kinase 
II phosphorylation. Exon 3 in the SIBLING family is 
usually somewhat proline-rich (PP), and among the 
acidic proteins it is the only significantly positive-charged 
domain. Exon 4 usually contains another casein kinase II 
site, and a unique integrin-binding Arg-Gly-Asp (RGD) 
motif has been added on the last exon. Enamel matrix 
proteins, however, contain only the first two domains, 
followed by the exon homologous to exon 2 of DSSP 
repeated as many as 10 times (in AMBN) (47,48). 



DISCUSSION 

Modularity is a powerful principle in the evolution of 
proteins of novel/altered activity because it facilitates the 
generation of novel combinations of already existing 
structural and functional elements. In accord, modularity 
is apparent both at the genome (exons, genes) and 
proteome (structural building blocks, motifs, domains) 
levels. It has been suggested that the split nature of 
genes facilitates the creation of novel genes through 
re-shuffling of exons via intronic recombination (49). 
Although the correlation of exons and domains does not 
always hold, exon shuffling is undoubtedly a prevalent 
mechanism of the generation of complex genes encoding 
for multi-domain proteins, as evidenced by the genomic 
bias for exon-phase symmetry (1-3,5). Because inclusion/ 
exclusion of symmetric exons does not impair the transla- 
tion reading frame, they are favoured in shuffling reac- 
tions. Further, their incorporation into a recipient gene 
may result in a protein product of altered/improved func- 
tionality, provided they encode for autonomous struc- 
tural/functional units of proteins (domains) (5,6). 

It occurred to us that this principle may be extended to 
structurally disordered proteins/regions. Because modu- 
larity in the form of motifs, linkers and disordered 
domains is often encountered in disordered proteins 
(13,22-24,26), and the incorporation of a disordered 
segment into a host protein might not impair the struc- 
tural integrity of the whole protein (15,34), we expected 
that the phase symmetry of exons and the structural 
disorder of the protein regions they encode correlate. By 
analysing the human genome, we found significant correl- 
ations, which suggest that the genetic potential of exon 
shuffling and the insertion/functional potential of struc- 
tural disorder act in synergy in the evolution of novel 
modular genes/proteins. 

The first explanation for this observed bias might be the 
preferential shuffling of symmetric exons encoding for 
structural disorder (e.g. because of functions associated 
with these regions or particular base frequency bias). Of 
course, 'exon shuffling', as we see it today, is a net result of 
two mechanisms, the exchange of genetic material (which 
we may call the actual shuffling, let it occur by gene con- 
version, meiotic recombination and tandem duplication) 
and subsequent selection (for or against) the new exon 
(new gene), which is independent of the original genetic 
mechanism and largely works on the viability /functional 
fitness of the new protein product. It is reasonable to ask 
whether shuffling of an exon preferably occurs if the exon 
is symmetric and/or encodes for a disordered region. It 
seems symmetric and asymmetric exons are shuffled 
alike because the mechanism of shuffling does not care 



Figure 4. Continued 

This protein contains multiple domains: MBD (yellow) three CXXC-type zinc-finger domains (dark blue) that mediate binding to non-methylated 
CpG dinucleotides and a transcriptional repression domain (TRD) (pink). (B) Modular assembly of the family of fully disordered secretory Ca- 
binding phosphoproteins, expressed in bone and teeth: DSPP, DMP1 and BSP. Enamel matrix proteins are AMEL, AMBN and ENAM regulate the 
deposition of inorganic phase in mineralized tissues (47). The family has a common ancestor from whom gene duplication led to two clusters: the 
enamel protein genes (AMBN-ENAM) and the bone-dentin protein genes (DSPP, DMP1, IBSP, MEPE and SPP1), in which diversification occurred 
by the insertion of functional [signal peptide, SP (orange); kinase phosphorylation site, SXE (yellow); proline-rich, PP (blue); a proline-rich phos- 
phorylation site (olive) and an integrin-binding tripeptide, RGD (red)] regions and/or tandem duplications of the exons. 



4420 Nucleic Acids Research, 2013, Vol. 41, No. 8 



about the position of the beginning and end of an exon 
(which is only defined in RNA, at the stage of splicing). 
Structural disorder also does not seem to matter much 
because recombination hot spots, as defined by transpos- 
able elements (TEs), such as Long interspersed elements 
(LINEs) and Short interspersed elements, e.g. Alu repeats 
(SINEs), correlate with genes that are involved in 
processes of external stimuli, immunity, cellular signalling, 
transport and signalling (50), or metabolism, transport 
and signalling (51). GC richness in the genome also 
seems to correlate with TEs, none of these previous 
features are strongly correlated with structural disorder, 
i.e. preferential recombination driven by TEs is probably 
not the primary mechanism responsible for the observed 
preference of exon-phase symmetry and structural 
disorder. 

An integrated novel genetic element is much more likely 
to be selected because of its structural and functional com- 
patibility with the recipient gene and encoded protein. 
In the case of folded proteins, this dilemma is thought 
to be solved if the exon encodes for a domain (or second- 
ary structure element) inserted at an appropriate point 
(most often in a loop) (3,5,6). In the case of disordered 
proteins (regions), this is not that much of an issue 
because they can easily accommodate multiple conform- 
ations (9,52) imposed by different end-point positions in 
the host protein. This is witnessed in alternative splicing, 
the conceptual equivalent of exon shuffling, which also 
inserts a novel segment into a protein and is facilitated 
by structural disorder (15,34). Novel symmetric exons 
brought into the gene by exon shuffling may equally well 
benefit, in an evolutionary sense, from encoded disorder. 

It is of equal importance, however, that the encoded 
region functionally integrates into the protein. Structural 
disorder has been linked with many functional attributes, 
such as uncoupling specificity from binding strength, 
adaptation to different partners, regulation by post- 
translational modifications and rapid association in 
binding reactions (8-10) and disordered regions often 
harbour functional elements, such as SLiMs/ELMs, 
linkers and domains (13,22,25,34,53). This is also sug- 
gested by the observed length distribution of symmetric 
exons encoding for disordered regions: they apparently 
have the capacity to encode distinct functional elements 
ranging from motifs to domains. Incorporation of these 
elements might add to the functional repertoire of the 
protein (changing activity, subcellular localization, 
protein-protein interactions and phase transitions) and 
advance the mechanism of exon shuffling. This is clearly 
seen in the case of the select examples, when a symmetric 
exon can encode for a SLiM/ELM (e.g. integrin-binding 
RGD motif in bone-dentin proteins), a disordered domain 
(e.g. inhibitory domain in CST) or a linker region (e.g. the 
projection domain in MAP2). By looking at domains and 
motifs overrepresented in symmetric disordered exons, we 
also found that most of them contribute a novel protein- 
protein interaction site that mediates interaction with 
SH3 domains, nuclear transport receptors or some other 
modular interaction domain (enabled/ VASP homology 
1 domain (EVH1), TRAF domain) or serve as post- 
translational modification site (sumoylation). 



Introduction of a disordered domain via a symmetric 
exon may contribute more complex functionality (chro- 
matin rearrangement and enzyme inhibition). In all, it is 
rather clear from the examples of overrepresentation that 
their evolutionary inclusion provides functional 
advantage, which contributes to fixation of the shuffled 
exon because it modulates the function of the protein by 
either affecting activity of the protein or its interactions 
with partner proteins. This is fully in line with recent 
observations based on comparing the human, fly and 
yeast interactomes (36), in which disordered proteins/ 
regions are preferentially involved in rewiring interaction 
patterns of proteins. In all, all these examples suggest that 
shuffling enabled by phase symmetry and structural com- 
patibility with the recipient protein because of structural 
disorder are necessary but not sufficient conditions for 
the evolution fixation of the shuffled exon: functional 
compatibility of the novel element of motif/domain must 
also come into picture for lasting fixation. 

Although its molecular mechanism is entirely different 
from exon shuffling, strong parallels of the structural and 
functional implications make alternative splicing pertinent 
to these points. Our results also show that alterna- 
tive splicing is significantly correlated with exon-phase 
symmetry and structural disorder. Previous studies also 
verified the correlation of alternative splicing with struc- 
tural disorder (15,34). Not unexpectedly, alternatively 
spliced regions are (also) enriched in functional motifs 
(23,34), and their absence/presence also promotes func- 
tional diversity of proteins and rewiring of the interactome 
(23,54,55). Intriguingly, structural disorder is a stronger 
feature distinguishing alternative exons from constitutive 
exons than symmetric exons from asymmetric ones, 
which suggests an even stronger influence in the case of 
alternative splicing (Table 3). The most likely reason is 
that constitutive exons have been shuffled and fixed only 
once, which is far less demanding on protein structure and 
function than alternative splicing, in the case of which 
both gene products have to be viable at the same time. 
This actually highlights the strength of structural disorder 
in making a region acceptable in a new gene product that 
arises as a result of the inclusion of a new exon. The sig- 
nificantly less influence on symmetry/asymmetry status 
than on alternative/constitutive status may also be com- 
patible with the toleration of alternatively spliced asym- 
metric exons because of the compatibility of structural 
disorder with frame shift, as established earlier (56). 

These strong parallels, despite unrelated molecular 
mechanisms, might even imply an evolutionary link 
between exon shuffling and alternative splicing. An exon 
selected for following exon shuffling — because of its phase 
symmetry and encoded structural disorder — may also 
have a better chance to be alternatively spliced. In fact, 
it was observed that a large proportion of species-specific 
exons (i.e. human exons that arose rather recently in evo- 
lution) are also alternatively spliced (31-33). Apparently, 
these younger exons have weaker splice-sites and a lower 
abundance of splicing regulators, which might point to a 
deeper underlying correlation between exon shuffling and 
alternative splicing, which remains to be seen. 
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A further interesting aspect of exon shuffling by virtue 
of exon symmetry is the effective generation of tandem 
repeats. This is clearly the case in our select examples 
(e.g. AMBN; Figure 4B) and is statistically verified by 
the increased homology of subsequent symmetric exons. 
This mechanism is probably also promoted by functional 
selection because of the statistical overrepresentation of 
encoded SLiMs. It was found that the same SLiM often 
reappears in a protein, and SLiMs also often occur in 
tandem repeats (30,53). The ensuing functional advan- 
tages of this arrangement are apparent because cognate- 
binding domains might also occur in tandem (57); thus, 
repetition of the motif may result in an increased avidity, 
specificity, even complex regulatory phenomena based on 
cross-talk between tandem binding and post-translational 
modification sites (55). Specific functional attributes 
of multiple adjacent post-translational modification sites 
have been observed, for example, ultrasensitivity of 
binding of yeast Sicl cell-cycle regulator to Cdc4, the 
substrate-recognition subunit of its cognate E3 ubiquitin 
ligase (58). A recent exciting development in the field of 
intrinsically disordered proteins even suggests a physical 
perspective to this phenomenon because the interaction 
of repetitive motifs and repeated domains [for example, 
in the binding of Wiskott-Aldrich syndrome protein 
(WASP) to non-catalytic region of tyrosine kinase 
adaptor protein 1 (NCK) (59) or between low-complexity 
regions of RNA-binding proteins (60)] can cause a phase 
transition in the form of micrometre-sized liquid droplets. 
This transition depends on the valency of both partners, 
and it can help bridge the length scales of proteins 
(angstrom) to that of organelles and cells (micrometres). 
The transition can be regulated by post-translational 
modification(s), and it can regulate protein activity. 

In all, the effective operation of the mechanism put 
forward in this article may also shed some light on the 
advance of protein disorder in eukaryotic evolution. It 
has been often stated that the occurrence of structural 
disorder is higher in eukaryotes than in prokaryotes 
(12,16-19), which is associated with their function in sig- 
nalling and regulation (12,16). The underlying genetic 
mechanisms, however, have hardly ever been addressed. 
Here, we can add one prevalent mechanism, module 
exchange based on exon shuffling, which may have 
contributed to the eukaryotic success story of structural 
disorder. In effect, we might actually suggest that the 
potential of evolutionary creation of novel genes, as 
outlined in this article, can be added to the 'functional' 
advantages of structural disorder. 
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